INDEX
Explanations
words related to negative actions or misunderstandings
terms related to mistakes or misunderstandings
New Auto-Interp
Negative Logits
cake
-0.64
frey
-0.60
CHAT
-0.59
stones
-0.57
ModLoader
-0.55
uphill
-0.55
instein
-0.55
stakes
-0.55
hubs
-0.54
chat
-0.54
POSITIVE LOGITS
vous
1.05
ceived
0.90
ceptions
0.89
akable
0.78
ventures
0.78
omers
0.76
gotten
0.75
rued
0.73
inations
0.73
gments
0.71
Activations Density 0.045%