INDEX
Explanations
expressions of affirmation or agreement
New Auto-Interp
Negative Logits
]--;
-0.81
WriteBarrier
-0.80
NUMX
-0.79
'\\;'
-0.78
aarrggbb
-0.78
".
-0.75
TestBed
-0.75
pozdrawiam
-0.73
_
-0.73
),
-0.71
POSITIVE LOGITS
Well
2.04
Well
2.02
WELL
1.25
well
1.21
WELL
1.11
Welp
1.03
well
0.90
Ну
0.85
Okay
0.83
Welles
0.83
Activations Density 0.044%