INDEX
Explanations
words related to negative or critical contexts
words or terms related to surprise, anger, and various structured themes or events
New Auto-Interp
Negative Logits
âĨij
-0.70
Leilan
-0.68
---------
-0.67
Naples
-0.64
ãĢĮ
-0.64
Eugene
-0.64
Florence
-0.63
UMP
-0.61
FN
-0.61
å¿
-0.60
POSITIVE LOGITS
"
1.10
"],
1.09
"]
1.01
"!
1.00
",
0.98
":
0.97
tainment
0.96
"â̦
0.94
")
0.94
"-
0.93
Activations Density 0.225%