INDEX
Explanations
words related to declarations or claims
references to declaration or classification terms
New Auto-Interp
Negative Logits
BIP
-0.71
è£ı
-0.69
tremend
-0.67
awaru
-0.65
ãĤ®
-0.64
ãĥ´ãĤ¡
-0.64
BALL
-0.63
κ
-0.62
DEM
-0.61
Bey
-0.61
POSITIVE LOGITS
osures
1.09
avier
1.04
othes
1.04
osing
0.98
iff
0.98
ipper
0.97
inic
0.96
arent
0.95
utch
0.95
uster
0.94
Activations Density 0.012%