INDEX
Explanations
adjectives and nouns relating to logical reasoning
logical reasoning and arguments
New Auto-Interp
Negative Logits
lain
-0.86
adal
-0.75
emi
-0.72
chuk
-0.68
rael
-0.68
andals
-0.67
orks
-0.67
sung
-0.66
toured
-0.64
Volunte
-0.64
POSITIVE LOGITS
posit
1.01
inference
0.94
deduction
0.90
necessity
0.89
fallacy
0.87
deductions
0.86
progression
0.86
istically
0.86
istic
0.84
\\\\\\\\
0.83
Activations Density 0.014%