INDEX
Explanations
phrases related to legal or political controversies
references to significant events following a tragedy
New Auto-Interp
Negative Logits
lav
-0.87
Concent
-0.85
leans
-0.77
nel
-0.77
lain
-0.76
eneg
-0.72
Nights
-0.72
utral
-0.72
Wide
-0.70
izontal
-0.68
POSITIVE LOGITS
天
0.71
samurai
0.66
endi
0.64
delinquent
0.64
ho
0.63
ername
0.62
fitness
0.62
Tok
0.62
cos
0.61
prediction
0.60
Activations Density 0.000%