INDEX
Explanations
phrases suggesting doubts or questioning perceptions
Appearing or perception not being reality
appearance versus reality
New Auto-Interp
Negative Logits
AndEndTag
-0.46
ConstraintMaker
-0.45
الحياه
-0.44
AnchorStyles
-0.43
preventative
-0.42
defaultstate
-0.42
المناصب
-0.41
erapeutics
-0.41
]")]
-0.41
SequentialGroup
-0.41
POSITIVE LOGITS
decep
0.47
以为
0.45
以為
0.45
EDEFAULT
0.45
superfic
0.42
misconception
0.40
facade
0.39
misconceptions
0.39
deceiving
0.39
看似
0.38
Activations Density 0.380%