INDEX
Explanations
phrases in a specific language or with a unique character pattern
special characters and non-standard punctuation
New Auto-Interp
Negative Logits
Stras
-0.76
ellen
-0.70
utterstock
-0.69
ignment
-0.69
ategic
-0.66
ouched
-0.65
ileaks
-0.65
wart
-0.63
Wichita
-0.63
warts
-0.62
POSITIVE LOGITS
âĸĵ
1.08
DIT
0.98
ĵ
0.95
¡
0.93
BLE
0.89
±
0.87
æµ
0.84
×Ļ
0.84
×ij
0.83
uses
0.83
Activations Density 0.005%