INDEX
Explanations
proper nouns or specific names
instances of the word "real" and discussions about what is genuine or authentic
New Auto-Interp
Negative Logits
hire
-0.90
azard
-0.83
utics
-0.78
zy
-0.77
xual
-0.76
azel
-0.74
avorite
-0.74
emi
-0.74
theless
-0.73
mph
-0.72
POSITIVE LOGITS
culprit
1.01
ignment
0.95
estate
0.89
isation
0.87
brunt
0.83
truth
0.78
igned
0.77
thing
0.77
impetus
0.77
embodiment
0.76
Activations Density 0.055%