INDEX
Explanations
expressions of personal opinion
New Auto-Interp
Negative Logits
perty
-0.74
Yourself
-0.65
ufact
-0.64
Stuff
-0.61
theless
-0.60
hide
-0.58
ancial
-0.57
Creature
-0.57
Ton
-0.56
arsity
-0.56
POSITIVE LOGITS
anyway
0.68
phas
0.68
anyways
0.66
unres
0.62
opian
0.59
âĶĢâĶĢâĶĢâĶĢ
0.58
terminating
0.57
it
0.57
boils
0.56
galitarian
0.56
Activations Density 0.037%