INDEX
Explanations
words related to court proceedings, opinions, scientific findings and apathy
New Auto-Interp
Negative Logits
}
-0.44
sfor
-0.43
diligence
-0.43
ellites
-0.42
non
-0.40
morm
-0.39
DebuggerStep
-0.38
Non
-0.38
tr
-0.38
aroa
-0.37
POSITIVE LOGITS
Theſe
0.97
Monfieur
0.92
myſelf
0.92
itſelf
0.89
greateſt
0.88
Efq
0.88
ſta
0.88
Diſ
0.87
Jefus
0.85
themſelves
0.84
Activations Density 2.826%