INDEX
Explanations
references to specific names, potentially related to people or places
proper nouns, specifically names related to individuals or characters
New Auto-Interp
Negative Logits
Ŀ
-0.70
¤
-0.68
selves
-0.67
Cortana
-0.64
Lisbon
-0.64
ĩ
-0.63
tics
-0.62
pron
-0.62
ı
-0.61
WATCHED
-0.61
POSITIVE LOGITS
bard
1.25
Roth
1.13
roth
0.99
stein
0.98
haar
0.95
igan
0.89
punk
0.88
schild
0.88
heit
0.87
leigh
0.87
Activations Density 0.009%