INDEX
Explanations
phrases indicating observation or speculation
phrases indicating appearances or predictions about situations
New Auto-Interp
Negative Logits
ocaust
-0.90
velength
-0.87
iqueness
-0.79
cial
-0.78
akable
-0.76
ategory
-0.71
cart
-0.71
utch
-0.66
cession
-0.65
together
-0.65
POSITIVE LOGITS
Rasmussen
0.79
TOR
0.71
unlikely
0.68
slowing
0.65
FSA
0.65
Chimera
0.63
Melania
0.61
whoever
0.60
Devin
0.59
McCabe
0.59
Activations Density 0.128%