INDEX
Explanations
instances of emotional or evaluative language related to events or actions
New Auto-Interp
Negative Logits
ationale
-0.08
nackte
-0.07
itoris
-0.07
ylko
-0.07
asal
-0.07
onga
-0.07
екаÑĢ
-0.07
edback
-0.07
tiener
-0.07
hai
-0.07
POSITIVE LOGITS
finally
0.21
Lastly
0.21
Lastly
0.20
Finally
0.19
finally
0.17
Finally
0.16
overall
0.15
altogether
0.14
Overall
0.13
all
0.13
Activations Density 0.199%