INDEX
Explanations
phrases related to actions and events
phrases indicating evaluation or response to events
New Auto-Interp
Negative Logits
THAT
-0.66
incent
-0.60
folks
-0.60
THESE
-0.57
ixties
-0.56
THIS
-0.56
inese
-0.56
ividual
-0.56
inian
-0.55
Those
-0.55
POSITIVE LOGITS
it
0.85
its
0.79
Its
0.65
Its
0.64
theirs
0.62
llah
0.60
its
0.59
uve
0.59
ADRA
0.57
Paddock
0.57
Activations Density 1.010%