INDEX
Explanations
phrases indicating past actions or experiences
phrases indicating strong opposition or conflict
New Auto-Interp
Negative Logits
rous
-0.62
upper
-0.62
ogical
-0.61
optic
-0.60
rial
-0.59
LIB
-0.58
ransom
-0.57
orus
-0.57
theaters
-0.57
ery
-0.56
POSITIVE LOGITS
ĪĴ
0.91
lately
0.84
recent
0.82
now
0.78
hasn
0.71
progress
0.69
alach
0.69
ierrez
0.68
recently
0.66
ateur
0.66
Activations Density 0.724%