INDEX
Explanations
references to a person named "Jo."
mentions of the name "Jo."
New Auto-Interp
Negative Logits
IFIED
-0.74
enegger
-0.73
flies
-0.72
amplification
-0.70
CLASS
-0.70
ashtra
-0.68
IMAGES
-0.66
interests
-0.66
intendent
-0.65
oÄŁ
-0.64
POSITIVE LOGITS
aquin
1.33
Jo
1.29
jo
1.13
ining
1.03
aqu
0.97
zeb
0.96
anne
0.95
anna
0.95
areth
0.91
zy
0.91
Activations Density 0.011%