INDEX
Explanations
mentions of revenge and related concepts
New Auto-Interp
Negative Logits
GBT
-0.15
lạc
-0.14
amel
-0.14
away
-0.14
adia
-0.14
getApplication
-0.14
warz
-0.13
isations
-0.13
UnderTest
-0.13
uckles
-0.13
POSITIVE LOGITS
against
0.17
wheel
0.15
loh
0.14
алÑĮнÑĸ
0.14
plier
0.14
Favor
0.14
asl
0.14
zÅij
0.14
Fairfield
0.14
junior
0.14
Activations Density 0.024%