INDEX
Explanations
references to historical dates and temporal contexts
New Auto-Interp
Negative Logits
tevens
-0.56
său
-0.53
nemlig
-0.49
deoarece
-0.48
např
-0.48
幸いです
-0.48
blijkt
-0.48
znacznie
-0.47
dezelve
-0.47
aldus
-0.46
POSITIVE LOGITS
somebody
0.61
gonna
0.60
somebody
0.60
Somebody
0.57
everybody
0.57
really
0.56
really
0.54
Somebody
0.52
fucking
0.52
guys
0.52
Activations Density 0.876%