INDEX
Explanations
facts or statements that are particularly important or noteworthy
repeated mentions of the phrase "the fact" in various contexts
New Auto-Interp
Negative Logits
avorite
-0.86
wana
-0.72
airs
-0.69
itsch
-0.69
asca
-0.66
iverpool
-0.63
ESE
-0.63
artney
-0.62
annis
-0.61
livest
-0.61
POSITIVE LOGITS
ually
1.14
uality
1.13
ional
1.07
orial
1.02
uate
0.84
itious
0.81
oids
0.79
uation
0.78
uated
0.77
uates
0.75
Activations Density 0.025%