INDEX
Explanations
mentions of locations or specific names
New Auto-Interp
Negative Logits
sburgh
-0.90
wrench
-0.77
Kinn
-0.65
ually
-0.64
vironment
-0.64
Takeru
-0.63
ettings
-0.61
urally
-0.60
ysis
-0.59
İĭ
-0.59
POSITIVE LOGITS
aday
1.09
riers
1.06
ouk
1.04
rier
1.02
thing
0.98
rer
0.92
agher
0.89
rug
0.85
bent
0.84
rak
0.83
Activations Density 0.014%