INDEX
Explanations
phrases related to specific terms and technical jargon
references to the term "guilty" and its variations
New Auto-Interp
Negative Logits
hips
-0.85
marked
-0.82
umbledore
-0.80
weeney
-0.79
ikarp
-0.78
killer
-0.74
rider
-0.74
icing
-0.72
fully
-0.70
battle
-0.68
POSITIVE LOGITS
ibi
0.94
ilty
0.79
Netanyahu
0.78
acl
0.77
ÄŁ
0.74
ose
0.74
ffer
0.73
vernment
0.70
Mesh
0.68
Hasan
0.66
Activations Density 0.007%