INDEX
Explanations
phrases related to personal beliefs or opinions
phrases expressing personal thoughts and opinions
New Auto-Interp
Negative Logits
ablished
-0.72
istor
-0.63
igers
-0.62
iken
-0.62
earable
-0.62
owered
-0.62
alking
-0.59
restling
-0.58
ocument
-0.57
EVA
-0.56
POSITIVE LOGITS
)</
1.33
!)
1.19
-)
1.17
*)
1.15
)}
1.00
)
0.99
)|
0.97
)'
0.96
>)
0.96
)!
0.96
Activations Density 0.384%