INDEX
Explanations
the word "logic" or its derivatives
logical reasoning or arguments
references to logic and reasoning
New Auto-Interp
Negative Logits
eneg
-0.72
orks
-0.72
Volunte
-0.72
semble
-0.71
ometown
-0.70
Shar
-0.68
affer
-0.68
lain
-0.66
hold
-0.63
national
-0.63
POSITIVE LOGITS
logic
1.18
Logic
0.94
matical
0.87
DragonMagazine
0.82
istically
0.79
matically
0.79
reasoning
0.74
istical
0.73
guiActiveUn
0.73
ãĤ¶
0.72
Activations Density 0.004%