INDEX
Explanations
instances of a specific Italian phrase
occurrences of the word "lla."
New Auto-Interp
Negative Logits
ij士
-0.92
citiz
-0.83
States
-0.80
states
-0.73
unden
-0.71
livious
-0.68
ß
-0.67
lay
-0.66
LESS
-0.66
Þ
-0.66
POSITIVE LOGITS
ppo
1.09
lla
1.07
uthor
0.97
ppa
0.97
zzo
0.96
ppe
0.95
ignt
0.82
ñ
0.82
quist
0.82
Rosa
0.80
Activations Density 0.011%