INDEX
Explanations
proper nouns or names of educational institutions
mentions of a specific name, likely a character or entity
New Auto-Interp
Negative Logits
uters
-0.67
mercial
-0.67
ablishment
-0.63
eared
-0.60
PLA
-0.60
utable
-0.60
Overt
-0.58
pport
-0.58
carefully
-0.58
Communities
-0.57
POSITIVE LOGITS
arth
1.13
ritis
1.08
osaurus
0.85
locks
0.83
Vader
0.83
\\\\
0.82
rils
0.82
rums
0.81
neau
0.80
alia
0.80
Activations Density 0.010%