INDEX
Explanations
references to logical reasoning and arguments
references to logic, particularly in legal and philosophical contexts
New Auto-Interp
Negative Logits
semble
-0.70
orks
-0.70
Volunte
-0.69
avez
-0.68
ometown
-0.68
hold
-0.65
eneg
-0.65
lain
-0.64
affer
-0.64
Shar
-0.63
POSITIVE LOGITS
logic
1.31
Logic
0.98
droid
0.84
istically
0.83
appl
0.81
matical
0.76
ynes
0.76
idi
0.72
reasoning
0.71
matically
0.71
Activations Density 0.007%