INDEX
Explanations
information related to reasoning and argumentation
significant numerical values or statistics related to critical issues
New Auto-Interp
Negative Logits
Custom
-0.64
Instr
-0.63
handshake
-0.63
"+
-0.63
\"
-0.63
Likes
-0.62
swearing
-0.61
Guinness
-0.61
slang
-0.60
"-
-0.60
POSITIVE LOGITS
underscores
0.92
illuminating
0.92
empir
0.90
underscore
0.87
empirical
0.87
miscon
0.86
illuminate
0.83
orthy
0.83
undersc
0.82
troubling
0.82
Activations Density 0.620%