INDEX
Explanations
texts related to personal experiences or interests
discussions about personal interests and experiences
New Auto-Interp
Negative Logits
discredited
-0.84
outraged
-0.79
alleged
-0.78
repud
-0.77
accuser
-0.77
retract
-0.76
disputed
-0.75
equivalent
-0.74
retracted
-0.74
dismant
-0.73
POSITIVE LOGITS
Favorite
1.28
Growing
1.23
Recently
1.21
Recently
1.16
Being
1.12
My
1.11
Growing
1.10
haha
1.10
Learning
1.08
Working
1.07
Activations Density 0.402%