INDEX
Explanations
references to various types of individuals, particularly focusing on the average or typical person in different contexts
New Auto-Interp
Negative Logits
otropic
-0.76
itars
-0.72
divisions
-0.70
ounding
-0.66
Shards
-0.65
disband
-0.64
overlapping
-0.64
acements
-0.63
subsidiaries
-0.62
giants
-0.62
POSITIVE LOGITS
knows
1.06
instinctively
0.96
understands
0.96
who
0.93
learns
0.90
beware
0.89
realizes
0.86
who
0.85
thinks
0.83
expects
0.81
Activations Density 0.126%