INDEX
Explanations
emotional expressions and sentiments related to personal beliefs and moral judgments
New Auto-Interp
Negative Logits
roup
-0.17
olik
-0.15
immers
-0.15
estroy
-0.15
Všech
-0.14
ekil
-0.14
ammers
-0.14
ovan
-0.14
Flake
-0.14
ewise
-0.14
POSITIVE LOGITS
Antworten
0.14
jadi
0.14
stir
0.14
GY
0.14
sand
0.14
Clamp
0.14
pus
0.13
bet
0.13
ataset
0.13
sr
0.13
Activations Density 0.225%