INDEX
Explanations
references to specific names, potentially related to a person's name
references to various types of food
New Auto-Interp
Negative Logits
ansas
-0.70
ned
-0.68
depress
-0.67
uit
-0.67
stream
-0.66
wegian
-0.64
blind
-0.63
locked
-0.62
meet
-0.62
pir
-0.62
POSITIVE LOGITS
zzi
1.11
zzo
1.11
orno
0.99
otti
0.98
zza
0.98
ucci
0.95
Äĩ
0.95
olini
0.92
arella
0.91
Rossi
0.89
Activations Density 0.042%