INDEX
Explanations
names starting with "Jo" in various contexts
repeated mentions of the name "Jo."
New Auto-Interp
Negative Logits
IFIED
-0.77
oÄŁ
-0.72
ashtra
-0.71
Wikimedia
-0.70
CLASS
-0.70
intendent
-0.68
amplification
-0.67
enegger
-0.67
flies
-0.66
overhead
-0.65
POSITIVE LOGITS
Jo
1.31
aquin
1.26
jo
1.17
ining
0.97
Jo
0.96
anna
0.95
zeb
0.93
areth
0.91
ppa
0.91
aqu
0.88
Activations Density 0.009%