INDEX
Explanations
proper nouns, particularly names of individuals
proper names, particularly of individuals and relevant entities
New Auto-Interp
Negative Logits
nings
-0.72
saline
-0.72
lockout
-0.69
£ı
-0.65
ï¸
-0.65
à¼
-0.63
theft
-0.63
Tycoon
-0.62
showers
-0.61
ãĥ¼ãĥĨ
-0.61
POSITIVE LOGITS
hou
0.94
enegger
0.93
leton
0.92
borough
0.89
atari
0.88
schild
0.83
rov
0.79
hani
0.78
ront
0.78
Leaks
0.77
Activations Density 0.097%