INDEX
Explanations
expressions of personal opinion and social commentary
New Auto-Interp
Negative Logits
ivar
-0.15
utow
-0.15
ладÑĥ
-0.14
reu
-0.14
reon
-0.13
hus
-0.13
iesel
-0.13
lea
-0.13
vsp
-0.13
dea
-0.13
POSITIVE LOGITS
fine
0.35
fine
0.32
Fine
0.27
Fine
0.27
tough
0.25
screw
0.24
FINE
0.23
deal
0.23
Tough
0.21
go
0.20
Activations Density 0.236%