INDEX
Explanations
references to specific geographic locations or landmarks
New Auto-Interp
Negative Logits
itſelf
-0.92
Barbier
-0.89
שוליים
-0.74
myſelf
-0.74
Blon
-0.73
Muriel
-0.72
gynhyrchwyd
-0.71
Monfieur
-0.70
begre
-0.70
themſelves
-0.69
POSITIVE LOGITS
Kings
0.81
sap
0.79
Hin
0.75
sag
0.73
uskas
0.73
gatsby
0.73
Hin
0.72
γέν
0.72
she
0.71
Alvarado
0.71
Activations Density 0.005%