INDEX
Explanations
places or events with names that end in 'atra'
references to specific locations or cultural contexts
New Auto-Interp
Negative Logits
cre
-0.68
prob
-0.66
pool
-0.65
shades
-0.63
WR
-0.62
confirmation
-0.62
stud
-0.61
Inf
-0.61
Vol
-0.60
Cong
-0.58
POSITIVE LOGITS
atra
4.96
atri
1.31
arta
1.18
ahar
1.05
andra
1.04
atism
1.02
asca
0.98
agra
0.97
peria
0.95
atana
0.95
Activations Density 0.004%