INDEX
Explanations
mentions of ongoing actions or events
New Auto-Interp
Negative Logits
ernet
-0.20
azor
-0.17
stra
-0.15
exus
-0.15
aldi
-0.14
acus
-0.14
inte
-0.14
ısından
-0.14
urette
-0.14
states
-0.14
POSITIVE LOGITS
wrong
0.27
happen
0.24
happening
0.23
bump
0.23
-ons
0.21
happened
0.20
ons
0.20
wrong
0.19
down
0.19
happens
0.19
Activations Density 0.017%