INDEX
Explanations
mentions of specific entities such as names or locations
acronyms or abbreviations in the text
New Auto-Interp
Negative Logits
oxide
-0.68
Legions
-0.64
vae
-0.61
Chimera
-0.60
Jolly
-0.59
Sic
-0.58
©¶æ
-0.58
Balk
-0.57
midt
-0.56
Avenger
-0.56
POSITIVE LOGITS
FUL
1.00
INGS
0.97
TAIN
0.95
HAEL
0.94
IMAGES
0.93
ICLE
0.93
AGE
0.90
RIC
0.86
hyde
0.85
IER
0.82
Activations Density 0.118%