INDEX
Explanations
phrases indicating general applicability or broader concepts
New Auto-Interp
Negative Logits
providedIn
-1.01
itſelf
-0.91
myſelf
-0.89
Exactos
-0.88
themſelves
-0.87
Exacts
-0.86
Reſ
-0.82
becauſe
-0.79
ſtate
-0.78
Monfieur
-0.77
POSITIVE LOGITS
earlier
0.63
similarly
0.62
broader
0.61
simpler
0.59
もっと
0.58
dagegen
0.57
other
0.56
earlier
0.55
tler
0.55
bardziej
0.54
Activations Density 0.383%