INDEX
Explanations
statements that assert truthfulness
New Auto-Interp
Negative Logits
tagHelperRunner
-0.73
yntaxException
-0.71
ⓧ
-0.71
AppBundle
-0.70
Monfieur
-0.68
ſche
-0.68
myſelf
-0.65
незавершена
-0.65
المناصب
-0.64
DeleteBehavior
-0.63
POSITIVE LOGITS
believers
0.84
False
0.81
False
0.80
believer
0.75
false
0.72
false
0.69
blue
0.64
north
0.63
colors
0.63
True
0.62
Activations Density 0.086%